A 108 Gbps, 1.5 GHz 1D-DCT Architecture
نویسندگان
چکیده
A high-performance 1D-DCT architecture is proposed. It is based on the New Distributed Arithmetic Architecture algorithm (NEDA) [1]. Enhancements to NEDA are proposed to reduce the number of computations. Only addition operations are used, with 42 additions to compute the outputs for a 8x1 DCT. No subtractions, multiplications, or ROM are needed. High-throughput is achieved by pipelining the architecture. In every clock cycle, it receives eight pixels (each is 9-bits) as inputs, and produces eight DCT coefficients (each is 14bits). The delay of one pipeline stage is the delay of a 3-level 4:2 compressor tree. The architecture is implemented in 0.35μ technology; it runs at 1.5 GHz, and processes 108 Gbps of image/video sequence data. ú. Introduction Applications that process images and video sequences are data intensive. The storage/transmission of such amount of data in its row form is impractical. Data compression techniques are introduced to decrease the amount of data stored/transmitted without much affecting its quality. The most commonly used compression technique is the block-based transform coding. A block (group of pixels) of the image/video frame is transformed from the spatial domain to another domain (usually the frequency domain), in order to have a more compact representation. The image/video frame is stored/transmitted in the compressed form in order to save storage/bandwidth. Trying to retrieve/receive the information, it has to be transformed back to the spatial domain by the reverse transform. The optimal transform is the Karhunen-Loeve transform (KLT), as it packs most of the block energy into a few number of frequency domain elements, it minimizes the total entropy of the block, and it completely decorrelates its elements [2]. The Discrete cosine transform (DCT) was introduced by Ahmed et al [3] in 1974. Its close performance to the KLT, while being of much lesser complexity, made it an attractive alternative. The KLT transform is shown to give the best compaction performance, but its biggest disadvantage is that its basis functions are image dependent. This complicates its implementation. The DCT is proven to be next in performance in compaction efficiency, while having image independent basis functions. A lot of research has been devoted to reduce the complexity of the DCT algorithm (number of multiplications and additions required), and to build corresponding efficient architectures [4]-[10]. The JPEG image compression standard, the H.261 (p*64), video-conferencing standard, the H.263 videoconferencing standard and the MPEG (MPEG -1,MPEG-2 and MPEG-4) digital video standards use the DCT. In these standards, a 2D-DCT is applied to 8x8 blocks of pixels in the image/video frame. The 8x8 coefficients produced by the DCT are then quantized and coded to provide the actual compression. In typical images, most DCT coefficients for an 8x8 block of pixels are small and become zero after quantization. This property of the DCT on real world images is critical to the compression schemes. 0-7695-0716-6/00 $10.0
منابع مشابه
10 Gbps 16QAM Transmission over a 70/80 GHz Radio Test System
A millimeter-wave radio test system is implemented which demonstrates 16QAM transmission over 70/80 GHz band for data rate up to 10 Gbps. Performance of the 16QAM transmitter and receiver is evaluated in a loop-back lab set-up. With the proposed 10 Gbps on single carrier system architecture, it is possible to achieve 40 Gbps over a 5 GHz bandwidth when combined with polarization and spatial mul...
متن کاملEnergy-Efficient Hardware Architecture for Variable N-point 1D DCT
This paper proposes an energy-efficient hardware acceleration architecture for the variable N-point 1D Discrete Cosine Transform (DCT) that can be leveraged if implementing MPEG-4’s Shape Adaptive DCT (SA-DCT) tool. The SA-DCT algorithm was originally formulated in response to the MPEG-4 requirement for object based texture coding, and is one of the most computationally demanding blocks in an M...
متن کاملThree Dimensional Dct/idct Architecture
In this paper, the design and development of a new fully parallel architecture for the computation of the threedimensional discrete cosine transform (3D DCT) is presented. It can be used for the computation of either the forward or the inverse 3D DCT and is suitable for real-time processing of 2D or multi-view video codecs. The computation of the 3D DCT is carried out using the row-column-frame...
متن کاملAn efficient architecture for the in place fast cosine transform
The two-dimensional discrete cosine transform (2D-DCT) is at the core of image encoding and compression applications. We present a new architecture for the 2D-DCT which is based on row-column decomposition. An efficient architecture to compute the one-dimensional fast direct (1D-DCT) and inverse cosine (1D-IDCT) transforms, which is based in reordering the butterflies after their computation, i...
متن کاملTwo Dimensional DCT/IDCT Architecture
A new fully parallel architecture for the computation of a two-dimensional (2D) discrete cosine transform (DCT), based on the row-column decomposition is presented. It uses the same one-dimensional (1D) DCT unit for the row and column computations and (N 2 +N) registers to perform the transposition. It possesses features of regularity and modularity, and is thus well suited for VLSI implementat...
متن کامل